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(57) Abstract 

A number of techniques pemiit a plurality of servers to provide access to infomiation replicated on the servers and accessed by 
connecting S a well published address. One approach involves an extension of multicasting in which source '^^^'^.'^^^^^^ 
uSTzed toVitition th? address space to be serviced by a particular server. When a different address space a location ;^^/e^^^^^^^ 
with the load balancing policy, a plurality of techniques are utilized to ensure that a connected user obtams the needed information. A 
reconfapp^^^^^^ an extension to the TCP protocol to enable dynamic TCP designations. With this option, a sender provides a 

aTand a co^^^^ which a seiver can use. A server replies with a tag. a cookie and destination mfomiation. A security 'nechamsm 
Sf utn?.^ to pievent the connection from being hijacked when a "change destination" message is sent. Ilie third approach uti izes mg 
^itchinr A pool of servers is supported behind at least one virtual IP address. Hie servers servicmg that IP address set up a ami y of tag 

wi ch trees one for each server. When a virtual IP machine receives a tag-less packet, it directs one or more upstiearn router, to eiUier 
arfctuS^'addris t^^ subsequent packets should be directed or to a tag switched tree to which the connccuon should be diiected. 
?n £ IL^^^^^^^ load balancing aiSong servers handling connection requests to a well published network address can be achieved 
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«.Tn.r.n «3ERVERS FOR NETWORK APPLICATIONS 
HIGHLY-DISTRIBUTED SERVERS x^>^ 

• This invention is related 

inventors Amit Gupta ana mternet 
.Techniques for Securing ^^^^ ^3,,.,,,,. 
Multicasting" (Attorney Doc.et No^ 
P2270/TJC) which is incorporated by referen 
in its entirety. 

p^PtrnPOTTOD OF TO^NVgNIION 

mi^^^-^^^^^''^^^^^^^^ ,,,p,,er networks and. 
The invention relates to comp hiqhly- 
, 1 ho the provision of nigniy 
.ore particularly, to l^\J ,,,^,,,,,,suc^^s 
distributed servers for network applx 
web service on the Internet . 

- ^^^^^'"^'^^il^^ the number of clients and 

Exponential growth ^^^^^,^3, ^.^h as the 

servers connected to large scale n 

is causing increased problems of scale. At 

r::::;-r..a^ .... -;:rr 

X. theee ^^^,3,^ by network servers 

"^Teira o r~ strea. .or oo.panUs. 
represent a majuj. r,<=twork services 

..e depenaenoe of such -^^^ J^J-ti,, 

,3 .as lea to ™Vtrc: tr^wJ"^^^^^^ 

network f '^^^^^J^"^, , £or distributea, 

. ,,obal presence there ^^^^^^ ^^^^ 

uniform interface net problems in parts 

3„ceful degradation -J^^^"^ ° J ,,„p,,,.e, 
30 Jou^^^^^^^ gateways and the 

frttntiv used in construction Of 
networks are well known in the art . 
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Many researchers have explored different 
mechanisms to improve the performance of network 
servers and systems. Some popular techniques include 
caching, and load balancing through the use of 
multiple servers. Some sites identify different 
servers with different names, each serving different 
geographic regions. Some approaches attempt to 
distribute access to a plurality of servers by using 
the domain name system (DNS) to randomize the server 
assigned when a connection request is directed to a 
particular company server. Some approaches attempt to 
utilize or mis-utilize the DNS by modifying the 
functionality of the DNS to poll each of the servers 
associated with a well known address to find out how 
loaded they are. The DNS then resolves the domain 
name to the IP address of the least loaded server. 



The Problems 

As demand for large network services has 
increased disproportionately with the underlying 
infrastructure to support the demand, the usefulness 
of such networks has been hampered by the congestion 
and bottlenecks which result. Currently, it is not 
uncommon for users to wait for tens of seconds and 
sometimes even minutes before they can get any 
information from the more popular (high-traf f ic) web 
servers. These delays frustrate the users and make 
them less likely to use a network for obtaining 
desired resources and services. This wasted time and 
effort represents a loss of productivity for network 
users and the resulting revenue losses are 
particularly undesirable for commercial Internet 
sites. 

Typically, it would be desirable that a solution 
to the server problem have the following properties: 

1. The provider should be able to set up many 
different web servers at locations all over a network 
such as the Internet, without any restrictions (such 
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\s requiring aU web servers tc be .n .he sa.e 

"'""2' ' The clients (e.9. network users and their 
,.owsers, should he able to send revests to a single, 
= well -advertised IP address. 

' 3 Network servers should be able to, 

coordination, choose/dictate the clients that they are 
:uung to serve.. They should not >=^.«^--; ° 
listen to all the traffic from all clrents. Thrs 
nt /selection may change with time and such 

" ra::::lo:i1 imposl e.cessive additional burdens 

°" -^"thr'ruld not be a single point of 
j,Uure Rather, within a reasonable recovery time 
rior all requests should be directed to any 
" regaining server ,s,, although requests might 

ererience longer service times (graceful service 
degradation) . 

^r^T^T^v OF T"^ tt<ivt;-.ntI0N 

,,-ii->i the invention, multiple 
In accordance with tne 
20 ^^r. he provided which accomplish 

^4oi-T-ibuted servers can be proviuc>a 

distributed ^^3^3f^i degradation in the event 

load balancing and graceful g di3tributed 
f ,r„n^inle network failures. The loaa is ^ 

::-rdance With a ---^ vU^rifatr 
" --re rsHpr^e^Ce used individually 

... The first involves an extension of 
or in combination. The tirsc luv 

. ,v,^.^h can be characterized as 

niulticasting which can . 

o^^ncT " With manycastmg, one can crea 
"manycasting. wic / provide network. 

v,-^v,w distributed clusters that provx 
30 highly distri desirable properties 

services with all ot specific 
described above. In ---^^^^ ^^^^ ^ ..Jsh the 
..joins- and "leaves" may be -t^^-^^^ , 

-^^°\::s:::er"rn;^^^^^^^^^^ — - - 

interfering with the communications. Using the 
first alproach, four different techniques are ut.U.ed 
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.eaUoca.in. connections when lo.d r^^^^^ 
Th^v are (D connection reset, (2) state 
:rn:::on.rt:LTna ..ate, .essa.e .orwar^.n. an. 

^:^::ri;:.oac. to overco.in. t.e p.o.Ie.s o. 

'^"°Th utHres an extension to the S.N packet used 
approach utilizes a provides a tag 

in a TCP. With this option, the sender p 

. v^v, ^hP receiver can use. If tne 
and a cookie which the receiv .^v, a 

ana a >- renlies with a 

=»rver is similarly equipped, it replies 

L that includes a tag, a cookie and destination 
message that inc overloaded, it 

information. wne" " ^ f„ . ipgs loaded 

forwards all new connection requests to a less load 
server in accordance with the distribution policy^ 
/ the same approach can be utilized to redirect 
However, the same PP ^„ ^ different 

. connection f rom ^ e-ti^.^^ ^^^^^^^^ 

th: Tonnect^on from being hi.ao.ed when a 

"-The Tir:::: :::::LVtV:oting the prob^m^ 

servers is supporce nackets for the 

.Hr-tual IP routers direct the packets 
address; virtual ir ^ servers 
virtual IP addresses to the server pool. The ser 

.family of tag switch trees (one for each real 
set up a tamixy uj. toqo 
Terver) . When a virtual IP router receives a tag-less 
TacIIt it forwards the packet to the actual IP 
address of a selected server and informs one or more 
address oi a address to which 

.p.tream -^J,: H^.^ed. The upstream 

rure::"a- an packets for that connecti^ with 

rn for the designated server. Thus, 
a tag ID for the J ^^^^^^^ forwarded via tag 
subsequent packets will be correctly 

switching. present invention will 

The advantages of tne pt«o 
V, readily apparent to those skilled m the art 

r ie following detailed description, wherein only 
from the toiiowxny ^r-o <:3hown 

^ J. ^„^„ „f the invention are snowu 

the preferred embodiments of tne 
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and described, simply by way of illustration of the 
best mode contemplated of carrying out the invention. 
As will be realized, the invention is capable of other 
and different embodiments, and its several details are 
capable of modifications in various obvious respects, 
all without departing from the invention. 
Accordingly, the drawings and description are to be 
regarded as illustrative in nature, and not as 
restrictive. 



RPTEF DESCRIPTION OF THE DRAWINGS 
Figure 1 is a block diagram of an exemplary 
network arrangement linking a plurality of sub- 
networks in accordance with one aspect of the 
invention . 

Figure 2 is an illustration of how a multicast 
address space may be partitioned into a private 
multicast address sub-space and public multicast 
address sub-space. 

Figure 3 is a database schema showing a typical 
domain named server (DNS) record in accordance with 
the prior art . 

Figure 4 is a database schema of a DNS server 
modified in accordance with one aspect of the 
invention. 

Figure 5 is a diagram of extension to an Internet 
Group Management Protocol (IGMP) join request in 
accordance with one aspect of the invention. 

Figure 6 is a flow chart of an exemplary routing 
element process for determining whether to permit or 
reject an IGMP join request in accordance with one 
aspect of the invention. 

Figure 7A shows a prior art IGMP join request. 

Figure 7B shows a prior art extension to the IGMP 
join request of Figure 7A. 

Figure 7C shows an extension to prior art IGMP 
join requests in accordance with one aspect of the 
invention. 
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Pigure 8 is a flow chart of a process £6r netting 
,p a prfvate .ulticas. in accordance with one aspect 

°' ^",^rn^ bloc, aiagra. showin. an exemplary 
f i„t en*odi.ent providing load sharing among servers . 

Figure 10 is a flow chart of a process for load 
.haring^mong servers in accordance with one aspect of 

II is a partial pie chart showing a change 
in the address space assigned to a server. 

Figure 12 is a flow chart of a first P-cess for 
dealing with existing connections when a load 

uT a .IocK dlagra. Of an exemplar, 
. second embodiment for load sharing among servers rn 

accordance with another aspect of the rnventron. 

Figure 14 is a flow chart of a second P-«- 
dealing With existing connections when load 

"""Frg::ririrrf.ow chart of . third process for 
° dealing with existing connections when load 

■^^""rgirirn flow chart of a fourth process for 
dealiir-th -i--^ - 
- ''-^X^r::'^ ..o.. diagram of a tag switching 

.r^oroach to load sharing among servers. 

pfgure » is a process for load sharrng rn 
accorda^^e with the invention using the arrangement of 
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''"'"i'glre 1. is a flow chart of a process for 



Figure d.^ -^^ - .IT 
^isr^v in the arrangement of Figure 17. 

^'^^-^rilufe r is a bloc, diagram of a dynamic TCP 
.estma^on approach to - ^ rre 

in this approach, extensions to the TCP p 

^^''"';rgure 21 is a flow chart of a process for 
.witchirg servers in a dynamic TCP environment of 
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figure 20 in a way that prevents hijacking o£ a 

""^r^re .2. illustrates a computer of a type 
suitable for carrying out the invention. 

Figure 22B illustrates a block diagram of the 
■ hardware o£ the computer of Figure 22A. 

' Ti^e «C illustrates an exemplary memory medium 

22B or 2210B in Figure 22A. 

The detailed descriptions which follow may be 

= those skill ^^^^^^ ^^.^^^^ 

the substance of their worit 
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art . 



^ procedure is here, and generally, conceived to 
be a self-consistent sequence of steps leading to a 
f red result These steps are those requiring 
desired result. physical quantities, 

physical °' ,^3, quantities take 

Usually, though not necessarily, these q 
the form of electrical or magnetic signals "P^"!^ °^ 
being stored, transferred, co^ined, compared, and 
being stor • p^ves convenient at times, 

otherwise manipulated. X ^^^^^^ ^^^^^ 

principally for reason ^^^^ ^^^^^^^^^ ^^^^^^ 

these signals as bits, 

^i,;.T-»cters, terms, numbers, or the like. 

characters, similar terms 

r to";cLtd"it: the appropriate physical 
quantities and are merely convenient labels applied to 

'"^^rt^irVhe manipulations performed are often 
referred to in terms, such as adding or comparing 

are commonly as^^. s^ ^ap^ 1^ 

performed by a human operator^ No ^^^^ 
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^ of the present invention; the operations are 
'°^:-Toperat Useful machines for performing 

T^ZrZr:. the present invention incXuae .enerai 

p..pose digital computers ;^l^J-':rs^^^s^us 

The present invention also relates to app 
tor performing these operations. These apparatus may 
spec any constructed for the rec^ired purpose o 
comprise a general purpose computer as 
Tele: IvelTact vated or reconfigured .V a computer 
seiectivexy a „„^«^ The procedures 

in the computer. J-ne pi- 
Tefe^e/ are not -herently related t a 
.rticular computer or -her apparatus^ V r ous 
aeneral purpose machines may be used witn P ^ 
frltten in accordance with the teachings herern, or rt 
prove more convenient to construct more 

15 may prove more required method 

specialized apparatus to ^'^^^^^^ ,h,,e 

steps. The required structure for a variety 
machines wiU appear from the description given. 

piZ^mrr^I^^^^^" °' exemplary 

.etwor/ arrangement "n^ing a plurality of su. 

npfworks in accordance with one aspect 
networKs j." . , = r^in-ralitv of sub- 

^--r .o:: ^Br-r" d' :oor:re innected 
::::::: routers ... - and ^ 

network illustrated, DNS .^^/"/.^^^ or 

sub-network lOOB and a 

authority ISO as resident on sub-network lOOO^ One or 
more senders UO may be the - ^er 
information for the multicast to exempl 

--'Z^: r aTnlustration of how a multicast 

space may ^ ---f pltc Ura" 
multicast address sub-space and p 

"■^■^^rtfTtna Side Of Pigur. . represents the 

That space ranges from 
.otal multicast address =P-- ,,,,,,1 
224.0.0.0 (in Internet standard 
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" . , f„ 239 255.255.255. Underneath the dotted 
r""r representation is a parenthetioal showing 
rhinarH ts (hracKeted, which corresponds to the 

r^l notatron Each of the other components of 
' treTott:: decli notation represent the value of a 

rt^onding bvte in a l^-^^^::":^:^ 
m r/arted t ^ots fro. another 

tatTon of the same binary value represents an 
" representation of ^.^^ 3^.,,, 

rdrsr::rd hie o^y those binary values contained 
• one of the important extensions to the 

rilirast address space provided in accordance with 
r —ion is a separa^on Of -J^J^ —I 
space into two components, the firs 
puhlic multicast address J^;^ 
a orivate multicast address space. 

2 the public multicast address space ranges 
Figure 2, the pu)ox Similarly the 

ate multicast address space ranges 

address space, °- " ,,l,i,„t is 

multicast address whether a P^^^^ 

,.=vt,ken or a public multicast is undertaken. 

Pig^r. 3 is^ database schema showing a typical 

■ ^med server (DNS) record in accordance with 
domain named server ^^^.^^^ 

.he prior art. As shown -J^^" ^^^^^ ,,,, 
address 300 is mapped against database 
address 310 in respective colu™>s of the 

""%igure 4 is a database schema of a ONS server 
,. fier in accordance with one aspect of the 
modified a correspond to 

invention. Columns » „„fries 300 and 

to the columns in which entries juu 
approximately "^t^^ i„ 410, 

310 of Figure 3 ^^,^,,,3, .caress 

instead of a unicast address, an 
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• '.nrluded column 420 contains entries 
::scrbrtbe owner Of the multicast address 
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• 11V this would be the person setting up the 
Typically tnis wvj 

^^icast column 430 contains a public key to 
multicast ^^^^ contains an 

When a DNS server convenient that 

a ^""y submitted wit ^^^.^^ ^^^^^3 

column 410 »iU result m to 

--r "Ltrir: l: ::-c.ir «o ..^^ ^. 

M etrieve not only the networ. address shown 
T 400 the owner information shown in 420 but 
column 'J'^; ^^^^ column 430 for the 

also the public Icey ^^^^.^^^ ^^^^^ 

multicast session. This abili y 

<„i .c described more in alter. 

is a a^.- - — " " ^TTn 

° Figure p^„jo<.ci (IGMP) join request in 

Group Management ^"^"""^ ,„,,„,ion. A header 

accordance with one aspect of the ^.^^ ^ 

SCO, and packet type sho« in riaia ^ ^^^^^^ 

TP address shown m fiexu 
requester IP addr ^^^^^^ ^ ,1,3 

be part of prior art IGMP 3 

extensions shown in — ^^^^^^f p,,ced in 

^^on an optional timestatnp may ce p 
invention, an op ,^ ^.^^^ 3^ ^3 

1^ 1 and a random Key, pxo^ . -, j , 

Field 1 ana a contents of Field 1, 

generated -^^^^ cry^t d or digested and the 

Field 2 and Field 3 are enc yp 
digest encrypted and placed into \ 

. r rhPck 510 (CRC) encompasses tne 
Redundancy ChecK S ^^^^^^^ .^.^ ,3 

join request. How cni!= 

^ nc, discussed more hereinafter. 

r is a now Chart of an exemplary routing 

elemelHrocess for t^Lnl ore 

ol Tirtiir ^rUended .hp .in 

:rg::s-ste=eived at a router <e00) determination is 
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made from the address whether or not the multicast is 
public or private (605). If it is public (605- 
public) , the join is permitted and the join request 
forwarded to the next routing element along the path, 
if any (640) . If the multicast is private (605- 
private) a check is made to determine whether the join 
request submitted is a duplicate of a previous 
request . One way an unauthorized user may attempt to 
gain access to a multicast would be to duplicate a 
join request submitted by a previous user. If the 
submitted join request is a duplicate (610-y) , the 
request is rejected. If it is not, a determination is 
made whether the join request is timely (615) . This 
a simple check to see that the join request is 
appropriate for the day and time of the current 
multicast session. This would prevent a user from 
copying an earlier join request from an authorized 
user in an attempt to gain access to the current 
session. If the join request is not timely (615-N) , 
the request to join is rejected. If it is timely, a 
check is made to determine whether the join request 
came from a proper link. If it did not (620-N) , the 
join request is rejected. However, if it did, the 
routing element will obtain -the public key dual 
corresponding to the private key utilized to encrypt 
the IGMP extended join request (625) . Preferably, the 
public key is obtained from a DNS server, such as DNS 
130 shown in Figure 1. Alternatively, the public key 
could be obtained from a certification authority 150 
shown in Figure 1. Using the acquired public key. 
Field 4 of the extended IGMP join request is decrypted 
using the public key (630) . The resulting information 
decrypted from Field 4 should agree with Fields 1-3. 
If it does, the join is permitted and the join request 
is forwarded to the next routing element. If it does 
not (635-n) , the join request is rejected and the user 
will be denied access to the multicast by the router. 

A third aspect of the invention is illustrated in 
Figure 7A, Figure 7B and Figure 7C. Figure 7A shows 



wo 99/30460 1 2 PCT/US98/26151 

a prior art IGMP join request. A header 700 and a CRG - 
field' for an envelope containing a join request 710 
and address 720. 

Figure 7B shows a prior art extension to the IGMP 
join request of Figure 7A. The extension of the IGMP 
join request of Figure 7B permits a lists of senders 
to be specified which are permitted to send to the 
address requesting the join. Similarly, it includes 
an list of senders prohibited from sending to the 
address requesting the join. This permits a 
participant in the multicast to inform routers to 
selectively prohibit packets from undesirable or 
disruptive sources from reaching the participant. It 
also permits the user to specify the list of senders 
i from which the requesting station desires to receive 

information. This allows the filtering out of packets 
that the user does not desire to see. 

Figure 7C shows an extension to prior art IGMP 
join requests in accordance with one aspect of the 
0 invention. Field 760 and Field 770 permit the use of 

a list of 32-bit masks instead of a list of senders or 
receivers. Thus, by tailoring a mask, groups of 
addresses may be permitted to send to the address or 
barred from sending to the address, merely by 
5 specifying the bit-mask appropriate for the group and 

the property desired. For example, the property may 
be "permicted to send to this address" or "prohibited 
from sending to this address" . 

Figure 8 is a flow chart of a process for setting 
0 up a private multicast in accordance with one aspect 

of the invention. A user desiring to set up a private 
multicast first creates a private/public key pair for 
the multicast (800) . The sponsor or owner of the 
multicast obtains a private multicast address (810) 
J5 for use during the multicast. This can either be a 

permanent assignment or a temporary assignment 
depending on need. The owner of the multicast or 
other designated party may install the public key for 
the multicast in the DNS information for the multicast 
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= or in a ««ificaticn server (820) . The 
/e "ey for the multicast i. distributed to 

private Key to several known ways, 

■ r,^A participants m any or sevcj-ax 
authorized particip ^^^^ 

u ^ oreferably over the network (810) . At 

but preteraciy receivers 

.V, n,nlticast is ready to begin (840). Tne r 

5 the multicast i . the multicast then 

^^^^ Ttf iLer/eaTr re:e;t such as describe, 
formulate an exte : ,,,,„,i,ed, the routing 

" 11 .a.e that determination using the public 

element w.ll maKe tha ^^^^^ 

" Key installed on th ^^^^^^^ ^^^^^^^ 

,,,t..catron serve ^ ^^^^^^^ ^^^^^^^ 

satisfied that the q ^^^^ ^^^^ 

„.lticast is 9-u.ne^ ^ out-3 ^^^^^^^^^ ^^^^^^^ 
directing pacKets ^d^res^e^ ' ^^^^ .^.^ 

to the user who submitted in the ^^^^^^^^^^ 
request. However, if the user .he user will 

J ^r, roniunction with Figure 6) , the user wi 

•'-;ut".T. rr: = - — 

^nSral server name is mapped to a muicic 
canonical ser ^^^^^^^ ^^^^^^^ ,^,3 

in the DNS. Eacnoi „,,..cast However, when 

n,r^^^ 930 llstens to the multicast, 
example 930 source specific 

"^'^ ^asw" rtp ^-"which permits them to 
30in and asseru a that are within 

-.r. nackets from certain senders that ar 
30 receive packets ^^^^^ ^ That is, 

.heir P--- ;'^^3_tclt I to the servers 930 based on 
connections are allocated ^^^^e^s . Thus, a 

'•^^ u^^ w-^-- to the multicast 

3. farss wur:: routed by routers .0 to only one of 

T.Z ""is a flow Chart of a process for load 
3haring^ong servers in -ordance with one ae^- o 

^^^r. The IP address space to be servic 
the invention, me ir 
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, „ers is divided up into a nuinber of portions 
the servers is ax one or more portions 

^^*™/:o servTce U„.„,. T.atis, if 

..e ^IfllllZlZ Z. mi,ht wish to divide up 

ThrrresHaL so that eaoh server s .oe 

:r a^Ite than others, on ^.^^^^^^^^^^^ 

the address space rnto ^>^^"^/ ,1,,^ each 

4.v,-ir-H nortion to one of tne se^^vcx 
' every third por ^^^^^^ ^^^^^ , 

' The way in which the load is balanced or the 
space. The way „,fter of an allocation 

.caress space shared is a matter of 
policy which is implemented as discussed 
. »„ initial assignment of address space, it may 
,5 After an initial 9 ^n an as needed 

be necessary to //^^""^^ ,.,.ple, when a 

.asis (1020,. This occur ^^^^^^ ^^^^^^ 

server became inoperative, oi, 

""'"''p.::r:ris a partial pie Chart showing a change 

'° I address space assigned to a server. 

the address sp ^^^^^^^ ^„ 

originally, 2S. of the reallocation (1020) 

..e . educed to 20.. .s a 

''",:"oT a real cation, approximately of the 
25 result of ^ « previously had been assigned to 

address space "^-h Pre ^^^.^^ „„„ ,een 

the server under the oia 

excluded under the new al ocation. On^^^^ 
problems in reallocation is .'^^ ^^^ver when 

connections currently being serviced by 
allocation of address space occurs, 
a reallocation .„aiij,g existing connections 

pour approaches to handling discussed 
when reallocating load among the servers 
hereinafter. ^ ^.^^^ process for 

,.gur.l2isa£ „,en a load 

dealing with "^^ting ^^^^^ allocation is 

reallocation occ r^ . ^^^^^^^^ ^^.^^^^^ 

Changing from 25. to 20 ^^^^^^ ^^^^.^^^ ^^^^^ 

with Figure H. "iii 
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indicating it no longer wishes to receive packets from 
the old portion of the address space and, 
substantially simultaneously assert a source specific 
join indicating that it desires to receive packets 
from the new portions of address space allocated to it 
(1200) . The connections from the excluded portion of 
the address space 1100 are simply closed down (1210) . 
The users whose connections have been closed down 
will, in most cases, automatically attempt to 
reconnect. When this occurs, the reconnect will be 
directed to the server seirvicing the portion of the 
address space to which that user's address belongs. 
Thus, connections are redirected from the old server 
to the new server with all but a small disruption. 

Figure 13 is a block diagram of an exemplary 
second embodiment for load , sharing among servers in 
accordance with another aspect of the invention. 

The network illustrated corresponds to Figure 9, 
however, it is expanded to show that the servers each 
join a control message multicast group (1300) . As a 
result, each server can send control messages to and 
receive control messages from each other. This 
control message multicast channel is also a convenient 
way of distributing allocation policy update 
information before executing a change in address space 
for the various servers . 

Figure 14 is a flow chart of a second process for 
dealing with existing connections when load 
reallocation occurs. As indicated in the discussion 
I of Figure 13, all servers listen to each other on a 

control message multicast channel and send control 
messages to each other (1400) . When address space is 
reallocated (1410) , some servers will begin receiving 
packets for existing connections from address space 
5 newly assigned to it for which it does not have 

connection state information (1420) . When this 
occurs, the packet will be forwarded over the control 
message multicast channel to all servers and the old 
server, which previously handled the connection, will 
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,,.plete handling the connection nntil a convenient 
compiet occurs. This arrangement has the 

,rea.aown message multicast 

potential to creace p father than routing 

„a«ic ^-^^ \z:::t:.Jtsi,. m^ticast 

such paCets . pac.et from 

channel, the old server, up 

new server, may inform the -^^-^ 
,p address for the '----^/^ control message 
«ould avoid -—"7;;/^^: cld server may 

""""ril " own ^he connection with the user 

rrt^^ltting an ^ - :re"L 

^^^TLTLVs'Lt^tis^yrach is one example 
routed to a new reallocating connectrons. 

of combrnxng tec^rq ^ ^^^^^^^ 

,,g^re 15 xs a £1 ^^^^^^.^^^ load 

dealing .^^^ J^^^' J^^jore, each of the servers 
reallocatron occurs J ^^^^^^^ ^^^^^^^^^ ^^^^^ 

is involved with a con „.iiocated (1510), 
a500, and when address space rs realloca 

a server, as discussed above , -Vjecerve a P ^^^^ 

„,ich it does not ^^^^^^^^^^^^ ..e control 

""muTt^st Channel a5.0, and the server 
message multicast ^ ^^^^ns sends state 

» previously handling ^h. connect.. ^^^^^^^ 

rnrtr i: acrrdance with the transferred state 
information (1530. ^^^^^^^^^ „ 

" " across the switched core, tag switching 
30 communicate across ^^e intranet 

lets the routers located on the edg 

■HK. taas that the switches can use to 
ts This minimizes the processing needed once 
packets. This m ^^^ched network. A 

.he packet -'"jj'^:^^ .Consist of tag switches 
3s tag-switching network ^^^^ ^^^^^^^^ 

"''ie core of aL intranet and tag-edge routers 
forming the core o „r t ANs and hosts to 

placed at the periphery to connect LM.s 
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In a tag- switching network, tags are assigned 
based on the destination network, domain, or host. 
Based on Layer 3 routing protocols such as OSPF (Open 
Shortest Path First) and BGP (Border Gateway 
5 ... Protocol), a router applies a tag to each packet of 
the traffic flow. For an ATM-switched network, the 
tag would become part of the link layer header in the 
VCI (Virtual Circuit Identifier) field of the ATM cell 
header. Packets are then switched through the network 
10 with each switch simply swapping the incoming tag for 

an appropriate forwarding tag rather than processing 
each packet's contents to determine the path. 

In general, a tag switch will try to populate its 
Tag Information Base (TIB) with incoming and outgoing 
15 tags for all the routes it can access, so that all 

packets can be forwarded by simple label swapping. 
Tag table info is exchanged using a (lightweight) Tag 
Distribution Protocol (TDP) . Tag allocation is thus 
driven by topology (as defined by routing) , not by 

20 traffic. 

Figure 16 is a flow chart of a fourth process for 
dealing with existing connections when load 
reallocation occurs. Before address space 

reallocation, each server will create a tag switched 
25 path for each connection that would be lost to the 

server during reallocation (1600) . When the address 
space reallocation is executed (1610) all newly 
established connections will be routed to the proper 
server based on the reallocation (1620) . However, old 
30 connections with a tagged path continue until a 

convenient closing point and then the tagged path will 
be broken down (1630) . 

Figure 17 is a block diagram of a tag switching 
approach to load sharing among servers. Figure 17 is 
35 similar to Figure 9 except that the user 1700 attempts 

to connect to a well publicized virtual IP address 
(1710) which is handled by one or more real . machines . 

Figure 18 is a process for load sharing in 
accordance with the invention using the arrangement of 
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Figure 17. Each of the servers shown in Figure 17 
sets up a tagged switched tree for routing in the 
network (1800) . When a virtual IP router gets a 
tagless packet, it selects a server and forwards the 
tagless packet to the selected server (1810) . The 
virtual IP router then informs the upstream router 
1710 (and this router can in turn inform some or all 
of its upstream routers) to mark all packets from the 
user with the tag ID of the designated server (1820) . 
The tag ID routing will supersede other routing and 
all future packets from the user will go directly to 
the selected server (1830) . 

Figure 19 is a flow chart of a process for 
changing policy in the arrangement of Figure 17. 
Servers periodically inform the virtual IP routers of 
the senders (or address space) they will service. 
Alternatively, they will inform the servers of their 
activity or load levels (1900) . The virtual IP router 
will implement the allocation policy by directing 
addresses to the tag for the appropriate server 
(1910). Thus, the virtual IP router will control the 
tag applied to a service request and therefore control 
the traffic directed to individual servers. 

Figure 20 is a block diagram of a dynamic TCP 
destination approach to load sharing among servers. 
In this approach, extensions to the TCP protocol are 
required. A user 2000 will forward a synchronization 
(SYN) packet, the specification of which is extended 
to provide for the possibility of sending a tag and a 
cookie to a server, such as SI (2010) . The server to 
which the user is originally assigned responds with a 
SYN-ACK packet, the specification of which has also 
been extended to permit a tag, cookie, and destination 
information (2040) to be added; the SYN-ACK packet 
contains the same tag and cookie value that the client 
sent in the SYN packet. The server thus responds to 
the tag and the cookie from the client by sending back 
the same tag and cookie in the extended SYN-ACK packet 
as was received in the extended SYN packet. The 
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client uses the tag- cookie values to match the SYN-ACK 
to the connection. If the loading on server SI 
becomes excessive, server SI may transfer a connection 
to server SN (2020) and the server SN will send to the 
user 2000 a packet 2050 containing the tag for the 
connection, the cookie and new destination 
information . 

Figure 21 is a flow chart of a process for 
switching servers in a dynamic TCP environment in a 
way that prevents hijacking of a connection. An 
administrator or other responsible person of the 
sponsoring organization which runs the servers 
responding to connection requests to a well published 
address creates a public key - private key pair and 
distributes the private key to all servers (2100) . 
The public key of the public - private key pair is 
installed in the DNS record for the canonical name/ IP 
address entry in the DNS server (2110) . 
Alternatively, rather than installing the public key 
0 in the DNS record, it can be obtained from an 

authentication server, trusted third party or the 
like. When a server sends the change destination 
message to the user, it authenticates the new 
destination information by encrypting it with the 
5 private key (2120) . The client then verifies the new 

destination information by decrypting it with the 
public key installed on the DNS server (2130) . In 
this manner, the client can verify that the change of 
information originated from an authentic source. 
0 Figure 22A shows a computer architecture which is 

suited for either a user workstation, for a controller 
for a switching node, for a routing element or for use 
as a server. However, when configured as a routing 
element, I/O devices will normally only be attached 
J 5 during servicing. When configured as a router, a 

plurality of communications interfaces or ports 2285 
will normally be provided, one for each port. When 
configured as a controller for a switch at a switching 
node, a hardware interface will be provided to link 
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'k «50 with a switching matrix. When configured 
" t a router or a controller, the computer may 
"'filed a. a board in an equipment rac. rather 
be installed a. ^ ^Hown. «hen 

Chan being a stanaaion „^ . server, the 

configured a. a --/°-f::^°;::,^\rtLgh other 
computer may commonly - ,,^.e 
packaging rs possible. processing unit 

„00 having disk dr ^^^^^^ ^^^^.^ ^ 

indications 2alOA and ,„ommodated by 

number o£. disk drive ^^^^^ include 

.he — ;^;f;™-s,r aalc, a hard disk drive 
a floppy disk arlve indicated by 

(not =hown externally) and a c ^^^.^^^ 

slot 2210B. The number and type of 

• nv with different computer configura 
typically, with ^p^^ „^^^h 

,he computer has the P ^^^^^ ^^^^ ^ ^^^^^ 
information is displayed ^^^^^ ^^^^^^^ 

r"f er":i;"r =1^^^^^^^^^ - - — ^ 

° rfpS- w;rkstation from sun Microsystems, Inc. 

Tigure 22B illustrates a block diagram of the 
. „f rhe computer of Figure 22A. A 

internal hardware of '^^"J information highway 
bus 2250 serves as the mam computer, 
interconnecting the other c-P--^ f rthe system, 
CPU 22SS is the central processing, unit ^^^^ ^^^^^^^ 
performing calculations and logic p ^^^^^^ 

CO execute P-^rams. Read J ^^^^^ 
random access memo^ U2«. c n ^^^^ ^^^^^^^^^^ 

- of the --;"-^;;;to the system bus 22S0. These 

or more disk dri ^^.^^^^ ^^^^ 

disk drives may be floppy ^ 
internal or external hard ' , 

,0M or DVD (Digital video Disks, d"«^ 
3. . display interface 22,S -^-^^^ ; vLed on the 

permits information from can 
display. Co»unications with externa 
occur over communications port 2285. 
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CPU 2200 includes a communications interface 22 85 
coupled to bus 2250. Communications interface 2285 
provides a two-way data communications coupling to a 
network link to a local network such as 10 OD of Figure 
1. For example, if communications interface 2285 is 
an integrated services digital network (ISDN) card or 
a modem, communications interface 2285 provides a data 
communications connection to the corresponding type of 
telephone line. If communications interface 2285 is 
a local area network (LAN) card, communications 
interface 2285 provides a data communications 
connection to a compatible LAN. Wireless links are 
also possible. In any such implementation, 

communications interface 2285 sends and receives 
electrical, electromagnetic or optical signals which 
carry digital data streams representing various types 
of information. 

The network link typically provides data 
communications through one or more networks such as 
lOOA-llOD of Figure 1, to other data devices. For 
example, the network link may provide a connection 
through local network to a host computer or to data 
equipment operated by an Internet Service Provider 
(ISP) . An ISP may in turn provide data communications 
services through the world wide packet data 
communications network now commonly referred to as the 
"Internet". The local network and Internet both use 
electrical, electromagnetic or optical signals which 
carry digital data streams. The signals through the 
various networks and the signals on the network link 
and through communications interface 2285, which carry 
the digital data to and from CPU 2200 are exemplary 
forms of carrier waves transporting the information. 
CPU 2200 can send messages and receive data, 
5 including program code, through the network (s), 

network link and communications interface 2285. In 
the Internet example, a server might transmit 
requested code for an application program through 
Internet, ISP, local network and communications port 
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2285. In accordance with . the invention, one such 
download application may include software implementing 
the techniques described herein. 

The received code^ may be executed by processor 
2255 as it is received, and/or stored in storage 
devices 2260, 2265 and/or 2271-2273, or other non- 
volatile storage for later execution. In this manner 
CPU 2200 may obtain application code in the form of a 
carrier wave. 

Figure 22C illustrates an exemplary memory medium 
which can be used with drives such as 2271 in Figure 
22B or 2210B in Figure 22A. Typically, memory media 
such as a floppy disk, or a CD ROM, or a Digital Video 
Disk will contain the program information for 
controlling the computer to enable the computer to 
perform its functions in accordance with the 
invention. 

The multicasting approach to server allocation, 
discussed above, provides a simple general purpose 
interface that works across a spectrum of varying user 
needs. It does not unreasonably increase the overhead 
for setting up and operating the multicast for users 
who would like to continue to set up simple open 
meetings. The systems provides security even if 
outsiders know the IP address and/or port number which 
'might otherwise enable them to misbehave or behave 
maliciously. The system is flexible in that it does 
not require the multicast sessions organizers to know 
the identity of all the senders and/or listeners in 
advance. It also permits servers or users to 
dynamically join the discussions when desired. 

Even if the system is compromised, it is possible 
to reasonably limit the damage caused by excluding 
that user or group of . users from the multicast 
session. The approach described here is also 
compatible with current and proposed mechanism and 
protocols for multicasting. 

The techniques described provide a variety of 
tools which can be used singly or in combination to 
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allocate connections to servers to provide for load 
balancing. 

Although the present invention has been described 
and illustrated in detail, it is clearly understood 
that the same is by way of illustration and example 
only and is not to be taken by way of limitation, the 
spirit and scope of the present invention being 
limited only by the terms of the appended claims and 
their equivalents . 
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What is claimed is: 

1. A method of allocating communications to a 
plurality of servers, comprising the steps of: 

a. allocating portions of an address space among 

the servers; and 

b. changing the portions allocated to a server 
while at least one server is handling communications. 

2 . The method of claim 1 in which changing the 
portions allocated to a server is done in accordance 
with a load balancing policy. 

3 . The method of claim 2 in which a portion of 
an address space is changed for a server by having 
said server execute a source specific leave, a source 
specific join or both. 

4. A method of allocating communications to a 
plurality of servers, comprising the steps of: 

a. directing all communications to be handled by 
said plurality of servers to a multicast address, and 

b. causing said plurality of servers to listen 
to packets originating from respectively different 
portions of the network address space. 

5. The method of claim 4 in which the portions 
of an address space assigned to a particular server 
may be changed to carry out a load balancing policy. 

6. The method of claim 5 in which, when a packet 
is received at a server over a user connection 
originating from a source address in a portion of an 
address space previously serviced by a different 
server but now serviced by said server, the server 
causes disconnection of that connection. 
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7. The method of claim 5 in which each of said 
plurality of servers participates in a control 
multicast channel. 

8. The method of claim 7 in which, when a packet 
is received at a server from a user having a source 
address in a portion of an address space not 
previously serviced by that server but which is 
currently serviced by that server, the server forwards 
that packet over said control multicast channel to all 
servers . 

9. The method of claim 8 in which the server 
previously handling the user processes the packet 
received over the control multicast channel. 

10. The method of claim 8 in which the server 
previously handling the user notifies the server 
currently servicing the user to forward future packets 
over a point to point connection between the servers. 

11. The method of claim 7 in which, when a 
packet is received at a server from a user having a 
source address in a portion of an address space not 
previously serviced by that server but which is 
currently serviced by that server and for which that 
server does not have current state information, the 
server requests state information over the control 
multicast channel. 

12. The method of claim 11 in which the server 
processes said packet in accordance with state 
information received over said control multicast 
channel . 

13 . The method of claim 5 in which, before a 
server ceases servicing a part of a portion of an 
address space, the server creates a tag switch path 
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for each connection to a user in that part o'f" the 
address space. 

14 . The method of claim 13 in which said server 
continues forwarding packets from clients using the 
tag switched path after the server no longer services 
that part of the address space. 

15. A method of allocating communications to a 
plurality of servers, comprising the steps of: 

a. establishing a tag switched tree to each 

server; 

b. directing all communications requests to be 
handled by said plurality of servers to a virtual IP 

address, and 

c. for each connection request, directing one or 
more routers to tag future packets from a user sending 
a connection request to a tag switched tree for a 
server to be assigned in accordance with a load 
balancing policy. 

16. The method of claim 15 further comprising 
the step of changing the load balancing policy using 
a process handling communications requests directed to 
said virtual IP address. 

17. The method of claim 15, further comprising 
the step of changing the relative number of connection 
requests directed to a particular server based on load 
being handled by all servers. 

18. A method of allocating communications, 
comprising the steps of: 

a. sending a SYN packet for a TCP connection 
from a user to a first server including a tag and a 

i cookie ; 

b. sending a SYN-ACK packet from the first 
server to a user including a tag, cookie and 
destination information. 



wo 99/30460 



27 



PCT/US98/26151 



19 The method of claim 18, further comprising 
the step of re-directing a connection from a first 
server to a second server by sending a pac.et rom 
said second server to said user containing saxd tag, 
said cookie and new destination information. 

20 A method of establishing a TCP connection to 
a server, by sending a SYN packet including a tag and 
a cookie to said server. 

21 The method of claim 20, further comprising 
the steps of receiving back from the server a SYN-ACK 
packet including a tag and said cookie. 

22 A computer network, comprising: 

a ' at least one user device connected to said 
network and sending information to a multicast 

'"'^r'' a plurality of servers connected to said 
network and configured to receive multicast packets 
only f-m users having source addresses from one or 
.ore respective portions of the network address space. 

23 The network of claim 22 in which the 
portions of the network address space assigned to a 
Ta/ticular server change in accordance with a load 
balancing policy. 

24 The network ci =lai™ 23 in which existing 
connection, are handled when a connection fro. one 
part of the network address space is transferred to a 
different server. 

25 The network of claim 22 in which all servers 
are connected over a control multicast channel. 
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A computer network comprising: 
a plurality o£ routers directing packets over 

links of said network; 

b. a plurality of servers connected to saxd 

' """°f at least one user device connected to said 

network and configured to send a connection request to 

a virtual IP address; - 

d a set of said servers servicing connectron to 

said virtual IP address, and 

p a device connected to saxa 

handling said connection -^""^ J^To 
thereto notifying one or more of saxd ^° 
direct future packets from said user to one of said 
, , ZL s^^Ji.^ by said device in which each server 

" said s'et of said servers establishes ^ -^P- - 

tag switched tree by which connectrons can be directed 
to- that server. 

27 The network of claim 26 in which said device 
„otifies at least one router to ^^^^^^^^ 
. from said user device to a tag switched tree 
particular server of said set of servers. 

26 A computer network, comprising: 
a at least one user device connected to sard 
network configured to send a connection request to a 

server including a tag and a cookie; and 

I. a server replying to said connection request 

with tag and said cookie. 

23. A computer program product, comprising: 
a. a memory medium; and ^ 
b a computer program, stored on sa 
medium comprising instructions for allocating mutually 
3 exclusive portions of an address .=P-^ /^"^^ °\ 

servers; and changing the P-^-^^f^" "idling 

1 - T ^4- 1 P^ast one server j-o 
server while at lease 

communications. 
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30 . A computer program product, comprising: 
a a memory medium; and 

b' a computer program, stored on said memory 
medium' comprising instructions for directing all 
communications to be handled by a plurality of servers 
to a multicast address, and causing said Pl""l-V of 
servers to listen to pacKets originatrng from 
respectively different portions of a network address 
Space. 

31. A computer program product, comprising: 
a a memory medium; and 

b a computer program, stored on said memory 
medium' comprising instructions for establishing a tag 
switched tree to each server of a set of servers. ■ 

32. A computer program product, comprising: 

a. a memory medium; and 

b. a 



cmwj. jr '"^ • 

„ , computer program, stored on said memory 
medium' comprising instructions for directing one or 
m routers to tag packets from a -er^sendxng a 
connection request to a tag switched "^^Jj l ^^j^ 
• -in accordance with a load balancing 

to be assigned m accoraai^-c 

policy. 



33. A computer program product, comprising: 
a. a memory medium; and 
b. 



a memory meaium; ^li^ 

a computer program, stored on said memory 

medium comprising instructions for ^^^^ ™ 

4-^ = f-ir-cit server including a i:ag 
packet from a user to a first serv 

and a cookie. 

34. A computer program product, comprising: 
a a memory medium; and 

b a computer program, stored on sa.d memory 
medium' comprising instructions for sending a SVN-ACK 
Tacet from a server to a user including a tag, coo.xe 
and destination information. 



PCT/US98/26151 

WO 99/30460 3 0 

3 5 computer apparatus comprising: 

a server configured to participate m a 
multicast and to receive packets from users having 
address in assigned portions of a network address 



space 

36 



The computer apparatus of claim 35 in which 
said server is configured to change the portion of the 
address space from which it receives packets in 
accordance with a load sharing policy. 

37 computer apparatus comprising: 
a a server configured to disconnect a user when 
a packet is received from said user originating from 
a source address in a portion of an address space 
previously serviced by a different server but now 
serviced by said server. 

38 computer apparatus cotivprising: 
a a server configured to participate an a 
control multicast channel and, when a P->^^^ - 
received at said server from a user having a source 
address in a portion of an address -P-^ 

. , ==vviced bv that server but which is 
previously £,^ard that 

currently serviced by that server, to 
packet over said control multicast channel to other 



servers . 

39 computer apparatus comprising: 

a a server configured to creates a tag switch 

path for each connection to a user in a P-^/^^^^^ 
^ I-' u czc^^^TP^T will not handle 

network address space which the server will 

after a change in address allocation dictated by a 
load balancing policy. 
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40. computer apparatus comprising: 
a a computer configured to run a process for 
handling connection requests to a virtual IP address 
and to assign a tag switched tree for a particular 
server to a user submitting a connection request. 

" 41 The computer apparatus of claim 40. in which 
said process implements a load sharing policy. 

42 computer apparatus comprising: 

a ■ a computer configured to send a SYN packet 
from a'user to a server including a tag and a cook.e. 

43 computer apparatus comprising: 

a ' a server configured to send a SYN-ACK packet 
to a user including a tag, cookie and destination 

information. 
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